Method for estimating examinee attribute parameters in a cognitive diagnosis model

ABSTRACT

A method and system for determining attribute score levels from an assessment are disclosed. An assessment includes items each testing for at least one attribute. A first distribution is generated having a response propensity represented by a highest level of execution for each attribute tested by the item. An item threshold is determined for at least one score for the first distribution. Each item threshold corresponds to a level of execution corresponding to the score for which the item threshold is determined. For each attribute tested by the item, a second distribution is generated having a response propensity represented by a lowest level of execution for the attribute and the highest level of execution for all other attributes tested by the item. A mean parameter is determined for the second distribution. An attribute score level is determined for the scores based on the item thresholds and the mean parameters.

RELATED APPLICATIONS AND CLAIM OF PRIORITY

This application claims priority to, and incorporates herein byreference, U.S. provisional patent application No. 60/559,922, entitled“A Polynomous Extension of the Fusion Model and Its Bayesian ParameterEstimation,” filed Apr. 6, 2004, and parent application U.S. patentapplication Ser. No. 11/100,364 entitled “Method For Estimating ExamineeAttribute Parameters In A Cognitive Diagnosis Model” filed Apr. 6, 2005,of which it is a continuation.

TECHNICAL FIELD

The embodiments disclosed herein generally relate to the field ofassessment evaluation. The embodiments particularly relate to methodsfor evaluating assessment examinees on a plurality of attributes basedon responses to assessment items.

BACKGROUND

Standardized testing is prevalent in the United States today. Suchtesting is often used for higher education entrance examinations andachievement testing at the primary and secondary school levels. Theprevalence of standardized testing in the United States has been furtherbolstered by the No Child Left Behind Act of 2001, which emphasizesnationwide test-based assessment of student achievement.

The typical focus of research in the field of assessment measurement andevaluation has been on methods of item response theory (IRT). A goal ofIRT is to optimally order examinees along a low dimensional plane(typically unidimensional) based on the examinee's responses and thecharacteristics of the test items. The ordering of examinees is done viaa set of latent variables presupposed to measure ability. The itemresponses are generally considered to be conditionally independent ofeach other.

The typical IRT application uses a test to estimate an examinee's set ofabilities (such as verbal ability or mathematical ability) on acontinuous scale. An examinee receives a scaled score (a latent traitscaled to some easily understood metric) and/or a percentile rank. Thefinal score (an ordering of examinees along a latent dimension) is usedas the standardized measure of competency for an area-specific ability.

Although achieving a partial ordering of examinees remains an importantgoal in some settings of educational measurement, the practicality ofsuch methods is questionable in common testing applications. For eachexaminee, the process of acquiring the knowledge that each test purportsto measure seems unlikely to occur via this same low dimensionalapproach of broadly defined general abilities. This is, at least inpart, because such testing can only assess a student's abilitiesgenerally, but cannot adequately determine whether a student hasmastered a particular ability or not.

Because of this limitation, cognitive modeling methods, also known asskills assessment or skills profiling, have been developed for assessingstudents' abilities. Cognitive diagnosis statistically analyzes theprocess of evaluating each examinee on the basis of the level ofcompetence on an array of skills and using this evaluation to makerelatively fine-grained categorical teaching and learning decisionsabout each examinee. Traditional educational testing, such as the use ofan SAT score to determine overall ability, performs summativeassessment. In contrast, cognitive diagnosis performs formativeassessment, which partitions answers for an assessment examination intofine-grained (often discrete or dichotomous) cognitive skills orabilities in order to evaluate an examinee with respect to his level ofcompetence for each skill or ability. For example, if a designer of analgebra test is interested in evaluating a standard set of algebraattributes, such as factoring, laws of exponents, quadratic equationsand the like, cognitive diagnosis attempts to evaluate each examineewith respect to each such attribute. In contrast, summative analysissimply evaluates each examinee with respect to an overall score on thealgebra test.

Numerous cognitive diagnosis models have been developed to attempt toestimate examinee attributes. In cognitive diagnosis models, the atomiccomponents of ability, the specific, finely grained skills (e.g., theability to multiply fractions, factor polynomials, etc.) that togethercomprise the latent space of general ability, are referred to asattributes. Due to the high level of specificity in defining attributes,an examinee in a dichotomous model is regarded as either a master ornon-master of each attribute. The space of all attributes relevant to anexamination is represented by the set {α₁, . . . , α_(k)}. Given a testwith items i=1, . . . , I, the attributes necessary for each item can berepresented in a matrix of size I×K. This matrix is referred to as aQ-matrix having values Q={q_(ik)}, where q_(ik)=1 when attribute k isrequired by item i and q_(ik)=0 when attribute k is not required by itemi. Typically, the Q-matrix is constructed by experts and ispre-specified at the time of the examination analysis.

Cognitive diagnosis models can be sub-divided into two classifications:compensatory models and conjunctive models. Compensatory models allowfor examinees who are non-masters of one or more attributes tocompensate by being masters of other attributes. An exemplarycompensatory model is the common factor model. High scores on somefactors can compensate for low scores on other factors.

Numerous compensatory cognitive diagnosis models have been proposedincluding: (1) the Linear Logistic Test Model (LLTM) which modelscognitive facets of each item, but does not provide informationregarding the attribute mastery of each examinee; (2) the MulticomponentLatent Trait Model (MLTM) which determines the attribute features foreach examinee, but does not provide information regarding items; (3) theMultiple Strategy MLTM which can be used to estimate examineeperformance for items having multiple solution strategies; and (4) theGeneral Latent Trait Model (GLTM) which estimates characteristics of theattribute space with respect to examinees and item difficulty.

Conjunctive models, on the other hand, do not allow for compensationwhen critical attributes are not mastered. Such models more naturallyapply to cognitive diagnosis due to the cognitive structure defined inthe Q-matrix and will be considered herein. Such conjunctive cognitivediagnosis models include: (1) the DINA (deterministic inputs, noisy“AND” gate) model which requires the mastery of all attributes by theexaminee for a given examination item; (2) the NIDA (noisy inputs,deterministic “AND” gate) model which decreases the probability ofanswering an item for each attribute that is not mastered; (3) theDisjunctive Multiple Classification Latent Class Model (DMCLCM) whichmodels the application of non-mastered attributes to incorrectlyanswered items; (4) the Partially Ordered Subset Models (POSET) whichinclude a component relating the set of Q-matrix defined attributes tothe items by a response model and a component relating the Q-matrixdefined attributes to a partially ordered set of knowledge states; and(5) the Unified Model which combines the Q-matrix with terms intended tocapture the influence of incorrectly specified Q-matrix entries.

The Unified Model specifies the probability of correctly answering anitem X_(ij) for a given examinee j, item i, and set of attributes k=1, .. . , K as:

${{P\left( {{X_{ij} = {1\alpha_{j}}},\theta_{j}} \right)} = {\left( {1 - p} \right)\left\lbrack {{d_{j}{\prod\limits_{k = 1}^{K}{\pi_{ik}^{\alpha_{jk}{xq}_{ik}}r_{ik}^{({1 - {\alpha_{jk}{xq}_{ik}}})}{P_{i}\left( {\theta_{j} + {\Delta \; c_{i}}} \right)}}}} + {\left( {1 - d_{i}} \right){P_{i}\left( \theta_{j} \right)}}} \right\rbrack}},$

where

θ_(j) is the latent trait of examinee j; p is the probability of anerroneous response by an examinee that is a master; d_(i) is theprobability of selecting the pre-defined Q-matrix strategy for item i;

π_(ik) is the probability of correctly applying attribute k to item igiven mastery of attribute k; r_(ik) is the probability of correctlyapplying attribute k to item i given non-mastery of attribute k; α_(jk)is an examinee attribute mastery level, and c_(i) is a value indicatingthe extent to which the Q-matrix entry for item i spans the latentattribute space.

One problem with the Unified Model is that the number of parameters peritem is unidentifiable. The Reparameterized Unified Model (RUM)attempted to reparameterize the Unified Model in a manner consistentwith the original interpretation of the model parameters. For a givenexaminee j, item i, and Q-matrix defined set of attributes k=1, . . . ,K, the RUM specifies the probability of correctly answering item X_(ij)as:

${{P\left( {{X_{ij}\alpha_{j}},\theta_{j}} \right)} = {\pi_{i}^{*}{\prod\limits_{k = 1}^{K}{r_{ik}^{*{({1 - \alpha_{jk}})}{xq}_{ik}}{P_{c_{i}}\left( \theta_{j} \right)}}}}},$

where

$\pi_{i}^{*} = {\prod\limits_{k = 1}^{K}\pi_{ik}^{q_{ik}}}$

(the probability of correctly applying all K Q-matrix specifiedattributes for item i),

$r_{ik}^{*} = \frac{r_{ik}}{\pi_{ik}}$

(the penalty imposed for not mastering attribute k), and

${P_{c_{i}}\left( \theta_{j} \right)} = \frac{^{({\theta_{j} + c_{i}})}}{1 + ^{({\theta_{j} + c_{i}})}}$

(a measure of the completeness of the model).

The RUM is a compromise of the Unified Model parameters that allow theestimation of both latent examinee attribute patterns and test itemparameters.

Another cognitive diagnosis model derived from the Model is the FusionModel. In the Fusion Model, the examinee parameters are defined asα_(j), a K-element vector representing examinee j's mastery/non-masterystatus on each of the attributes specified in the Q matrix. For example,if a test measures five skill attributes, an examinee's α_(j) vectormight be ‘11010’, implying mastery of skill attributes 1, 2 and 4, andnon-mastery of attributes 3 and 5. The examinee variable θ_(j) isnormalized as in traditional IRT applications (mean of 0, variance of1). The probability that examinee j answers item i correctly isexpressed as:

${P\left( {{X_{ij} = {1{\underset{\_}{\alpha}j}}},\theta_{j}} \right)} = {\pi_{i}^{*}{\prod\limits_{k = 1}^{K}{r_{ik}^{*{({1 - \alpha_{jk}})}{xq}_{ik}}{P_{c_{i}}\left( \theta_{j} \right)}}}}$

where

π*_(i) is the probability of correctly applying all K Q-matrix specifiedattributes for item i, given that an examinee is a master of all of theattributes required for the item,

r*_(ik) is the ratio of (1) the probability of successfully applyingattribute k on item i given that an examinee is a non-master ofattribute k and (2) the probability of successfully applying attribute kon item i given that an examinee is a master of attribute k, and

${P_{c_{i}}\left( \theta_{j} \right)} = \frac{1}{1 + ^{- {({\theta_{j} + c_{i}})}}}$

is the Rasch Model with easiness parameter c_(i)(0≦c_(i)≦3) for item i.

Based on this equation, it is common to distinguish two components ofthe Fusion Model: (1) the diagnostic component:

${\pi_{i}^{*}{\prod\limits_{k = 1}^{K}t_{ik}^{*{({1 - \alpha_{jk}})}{xq}_{ik}}}},$

which is concerned with the influence of the skill attributes on itemperformance, and (2) the residual component: P_(c) _(i) (θ_(j)), whichis concerned with the influence of the residual ability. Thesecomponents interact conjunctively in determining the probability of acorrect response. That is, successful execution of both the diagnosticand residual components of the model is needed to achieve a correctresponse on the item.

The r*_(ik) parameter assumes values between 0 and 1 and functions as adiscrimination parameter in describing the power of the ith item indistinguishing masters from non-masters on the kth attribute. Ther*_(ik) parameter functions as a penalty by imposing a proportionalreduction in the probability of correct response (for the diagnosticpart of the model) for a non-master of the attribute, assuming theattribute is needed to solve the item. The c_(i) parameters arecompleteness indices, indicating the degree to which the attributesspecified in the Q-matrix are “complete” in describing the skills neededto successfully execute the item. Values of c_(i) close to 3 representitems with high levels of completeness; values close to 0 representitems with low completeness.

The item parameters in the Fusion model have a prior distribution thatis a Beta distribution, β(a, b), where (a, b) are defined for each setof item parameters, π*, r*, and c/3. Each set of hyperparameters is thenestimated within the MCMC chain to determine the shape of the priordistribution.

One difference between the RUM and Fusion Model is that the α_(jk) termis replaced in the Fusion Model with a binary indicator function, I( α_(jk)>κ_(k)), where α _(jk) is the underlying continuous variable ofexaminee j for attribute k (i.e., an examinee attribute value), andκ_(k) is the mastery threshold value that α _(jk) must exceed forα_(jk)=1.

MCMC algorithms estimate the set of item (b) and latent examinee (θ)parameters by using a stationary Markov chain, (A⁰, A¹, A², . . . ),with A^(t)=(b^(t), θ^(t)). The individual steps of the chain aredetermined according to the transition kernel, which is the probabilityof a transition from state t to state t+1, P[(b^(t+1), θ^(t+1))|(b^(t),θ^(t))]. The goal of the MCMC algorithm is to use a transition kernelthat will allow sampling from the posterior distribution of interest.The process of sampling from the posterior distribution can be evaluatedby sampling from the distribution of each of the different types ofparameters separately. Furthermore, each of the individual elements ofthe vector can be sampled separately. Accordingly, the posteriordistribution to be sampled for the item parameters is P(b_(i)|X, θ)(across all i) and the posterior distribution to be sampled for theexaminee parameters is P(θ_(j)|X, b) (across all j).

One problem with MCMC algorithms is that the choice of a proposaldistribution is critical to the number of iterations required forconvergence of the Markov Chain. A critical measure of effectiveness ofthe choice of proposal distribution is the proportion of proposals thatare accepted within the chain. If the proportion is low, then manyunreasonable values are proposed, and the chain moves very slowlytowards convergence. Likewise, if the proportion is very high, thevalues proposed are too close to the values of the current state, andthe chain will converge very slowly.

While MCMC algorithms suffer from the same pitfalls of JML optimizationalgorithms, such as no guarantee of consistent parameter estimates, apotential strength of the MCMC approaches is the reporting of examinee(binary) attribute estimates as posterior probabilities. Thus, MCMCalgorithms can provide a more practical way of investigating cognitivediagnosis models.

Different methods of sampling values from the complete conditionaldistributions of the parameters of the model include the Gibbs samplingalgorithm and the Metropolis-Hastings within Gibbs (MHG) algorithm. Eachof the cognitive diagnosis models fit with MCMC used the MHG algorithmto evaluate the set of examinee variables because the Gibbs samplingalgorithm requires the computation of a normalizing constant. Adisadvantage of the MHG algorithm is that the set of examinee parametersare considered within a single block (i.e., only one parameter isvariable while other variables are fixed). While the use of blockingspeeds up the convergence of the MCMC chain, efficiency may be reduced.For example, attributes with large influences on the likelihood mayovershadow values of individual attributes that are not as large.

One problem with current cognitive diagnosis models is that they do notadequately evaluate examinees on more than two skill levels, such asmaster and non-master. While some cognitive diagnosis models do attemptto evaluate examinees on three or more skill levels, the number ofvariables used by such models is excessive.

Accordingly, what is needed is a method for performing cognitivediagnosis using a model that evaluates examinees on individual skillsusing polytomous attribute skill levels.

A further need exists for a method that considers each attributeseparately when assessing examinees.

A still further need exists for a method of classifying examinees usinga reduced variable set for polytomous attribute skill levels.

The present disclosure is directed to solving one or more of theabove-listed problems.

SUMMARY

Before the present methods, systems and materials are described, it isto be understood that this invention is not limited to the particularmethodologies, systems and materials described, as these may vary. It isalso to be understood that the terminology used in the description isfor the purpose of describing the particular versions or embodimentsonly, and is not intended to limit the scope of the invention which willbe limited only by the appended claims.

It must also be noted that as used herein and in the appended claims,the singular forms “a,” “an,” and “the” include plural references unlessthe context clearly dictates otherwise. Thus, for example, reference toan “attribute” is a reference to one or more attributes and equivalentsthereof known to those skilled in the art, and so forth. Unless definedotherwise, all technical and scientific terms used herein have the samemeanings as commonly understood by one of ordinary skill in the art.Although any methods, materials, and devices similar or equivalent tothose described herein can be used in the practice or testing ofembodiments of the invention, the preferred methods, materials, anddevices are now described. All publications mentioned herein areincorporated by reference. Nothing herein is to be construed as anadmission that the invention is not entitled to antedate such disclosureby virtue of prior invention.

In an embodiment, a method for determining attribute score levels froman assessment may include, for at least one item, each testing at leastone attribute, on the assessment, generating a first distribution havinga response propensity represented by a highest level of execution foreach attribute tested by the item, determining an item threshold for atleast one score for the first distribution corresponding to a level ofexecution corresponding to the score, generating a second distributionfor at least one attribute tested by the item having a responsepropensity represented by a lowest level of execution for the attributeand the highest level of execution for all other attributes tested bythe item, determining a mean parameter for the second distribution, anddetermining an attribute score level for at least one score based on theat least one item threshold and the at least one mean parameter.

In an embodiment, a method for determining one or more examineeattribute mastery levels from an assessment may include receiving acovariate vector including a value for each of one or more covariatesfor the examinee for an examinee, and, for each of one or moreattributes, computing an examinee attribute value based on at least thecovariate vector and one or more responses made by the examinee to oneor more questions pertaining to the attribute on an assessment, andassigning an examinee attribute mastery level for the examinee withrespect to the attribute based on whether the examinee attribute valuesurpasses one or more thresholds.

In an embodiment, a system for determining attribute score levels froman assessment may include a processor, and a processor-readable storagemedium in communication with the processor. The processor-readablestorage medium may contain one or more programming instruction forperforming a method of determining attribute score levels from anassessment including, for at least one item, each testing for at leastone attribute, on the assessment, generating a first distribution havinga response propensity represented by a highest level of execution foreach attribute tested by the item, for at least one score, determiningan item threshold for the first distribution corresponding to a level ofexecution corresponding to the score, for at least one attribute testedby the item, generating a second distribution having a responsepropensity represented by a lowest level of execution for the attributeand the highest level of execution for all other attributes tested bythe item, and determining a mean parameter for the second distribution,and determining an attribute score level for at least one score based onthe at least one item threshold and the at least one mean parameter.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects, features, benefits and advantages of the embodiments of thepresent invention will be apparent with regard to the followingdescription, appended claims and accompanying drawings where:

FIG. 1 illustrates an exemplary parameterization for the diagnostic partof a model for dichotomously scored items according to an embodiment.

FIG. 2 illustrates an exemplary parameterization for the diagnostic partof a model for polytomously scored items according to an embodiment.

FIG. 3 is a block diagram of exemplary internal hardware that may beused to contain or implement program instructions according to anembodiment.

DETAILED DESCRIPTION

The present disclosure discusses embodiments of the Fusion Model,described above, extended to cover polytomous attribute skill levels.The disclosed embodiments may generalize and extend the teachings of theFusion Model for polytomously-scored items with ordered scorecategories.

In an embodiment, the cumulative score probabilities ofpolytomously-scored M-category items may be expressed as follows:

$\begin{matrix}{{P_{im}^{*}\left( {{\underset{\_}{\alpha}}_{j},\theta_{j}} \right)} = {{P\left( {{{X_{ij} \geq m}{\underset{\_}{\alpha}}_{j}},\theta_{j}} \right)} = \left\{ \begin{matrix}1 & {m = 0} \\{\pi_{im}^{*}{\prod\limits_{k = 1}^{K}{r_{imk}^{*{({1 - \alpha_{jk}})}{xq}_{ik}}{P_{c_{im}}\left( \theta_{j} \right)}}}} & {m = \left( {1,\ldots \mspace{14mu},{M_{i} - 1}} \right)}\end{matrix} \right.}} & (1)\end{matrix}$

resulting in item score probabilities that may be expressed as follows:

$\begin{matrix}{{P_{im}\left( {{\underset{\_}{\alpha}}_{j},\theta_{j}} \right)} = {{P\left( {{X_{ij} = {m{\underset{\_}{\alpha}}_{j}}},\theta_{j}} \right)} = \left\{ \begin{matrix}{{P_{im}^{*}\left( {{\underset{\_}{\alpha}}_{j},\theta_{j}} \right)} - {P_{i{({m + 1})}}^{*}\left( {{\underset{\_}{\alpha}}_{j},\theta_{j}} \right)}} & {m = \left( {0,\ldots \mspace{14mu},{M_{i} - 2}} \right)} \\{P_{im}^{*}\left( {{\underset{\_}{\alpha}}_{j},\theta_{j}} \right)} & {m = {M_{i} - 1}}\end{matrix} \right.}} & (2)\end{matrix}$

where

π*_(im) is the probability of sufficiently applying all item i requiredattributes to achieve a score of at least m, given that an examinee hasmastered all required attributes for the item (π*_(i1)≧π*_(i2)≧ . . .≧π*_(im));

r*_(imk) is the ratio of (1) the probability of sufficiently applyingattribute k required for item to achieve a score of at least m giventhat an examinee is a non-master of attribute k, and (2) the probabilityof sufficiently applying attribute k required for item i to achieve ascore of at least m given that an examinee is a master of attribute k(r*_(i1k)≧r*_(i1k)≧ . . . ≧r*_(i1k)); and

P_(c) _(im) (θ_(j)) is a Rasch model probability with easiness parameterc_(im), m=1, . . . , M−1. The easiness parameters are ordered such thatc_(i1)>c_(i2)> . . . >c_(i(M-1)).

A feature of the Fusion Model—its synthesis of a diagnostic modelingcomponent with a residual modeling component—may be seen in Equation(1). In the dichotomous case, each item requires successful execution ofboth the diagnostic and residual parts of the model; that is, an overallcorrect response to an item occurs only when both latent responses arepositive. In the polytomous case disclosed herein, where multiple scorecategories may be used, a different metric may be relevant. Instead of acorrect response, the polytomous case may calculate whether anexaminee's execution is sufficient to achieve a score of at least m,where m=0, 1, . . . , M−1 (assuming an M-category item is scored 0, 1, .. . , M−1). In other words, if the separate latent responses to thediagnostic and residual parts of the model are being scored 0, 1, 2, . .. , M−1, an examinee may only receive a score of m or higher on the itemwhen both latent responses are m or higher. When translated to actualitem score probabilities in Equation (2), an examinee may achieve ascore that is the minimum of what is achieved across both parts of themodel.

Controlling the number of new parameters introduced to a polytomouscognitive diagnosis model is important in order to develop a computablemodel. If too many parameters exist, the processing power needed tocompute examinee attribute skill levels using the model may beexcessive. Based on Equation (1), every score category in every item(with the exception of the first score category) may include a π*_(im),a c_(im), and as many r*_(imk) parameters as there are attributes neededto solve the item. This may result in too many parameters per item tomake estimation feasible.

Alternate parameterization may be used to introduce a mechanism by whichrealistic constraints may be imposed on the diagnosis-related itemparameters (the π*'s and r*'s), while also ensuring that all scorecategory probabilities remain positive for examinees of all latentattribute mastery patterns and all residual ability levels.

FIG. 1 illustrates an exemplary parameterization for the diagnostic partof the model for dichotomously scored items according to an embodiment.As shown in FIG. 1, item i requires two attributes (attributes 1 and 2).Underlying normal distributions may represent the likelihood that anexaminee in a particular class successfully executes all requiredattributes in solving the item. For example, the classes may include (1)examinees that have mastered both attributes 1 and 2 105; (2) examineesthat have mastered attribute 1, but not attribute 2 110; and (3)examinees that have mastered attribute 2, but not attribute 1 115. Anitem threshold τ_(i1) 120 may define the location corresponding to thelevel of execution needed for a correct response. Accordingly, the areaunder the normal curve 105 above τ_(i1) for examinees that have masteredboth attributes may be equivalent to π*_(i) in the Fusion model. Thesecond normal distribution 110 may represent examinees who have masteredattribute 1, but not attribute 2. The second normal distribution 110 mayhave a mean parameter μ_(i1) 125 that is constrained to be less than 0(the mean of the response propensity distribution for masters of bothattributes), and a fixed variance of 1. The area above τ_(i1) for thisclass may be equal to π*_(i)×r*_(i2) in the ordinary Fusion Modelparameterization. The third normal distribution 115 may representexaminees who have mastered attribute 2, but not attribute 1. The thirdnormal distribution 115 may have a mean parameter μ_(i2) 130 that isconstrained to be less than 0 (the mean of the response propensitydistribution for masters of both attributes), and a fixed variance of 1.The area above τ_(i1) for this class may be equal to π_(i)*×r*_(i1) inthe ordinary Fusion Model parameterization. As in the Fusion Model, theprobability that an examinee that has not mastered either attribute willsuccessfully execute them is equal to π*_(i)×r*_(i1)×r*_(i2).

As such, three parameters may be estimated for this item in theparameterization: τ_(i1) 120, μ_(i1) 125, and μ_(i2) 130. Each of theseparameters may be directly translated into π*_(i), r*_(i1) and r*_(i2)based on the usual parameterization of the Fusion Model. The threeclasses considered above may thus be sufficient to determine the π_(i),r_(i1), and r*_(i2) parameters, which may be applied to determine thediagnostic component probability for the class that are non-masters ofboth attributes. In general, it may only be necessary to determine asmany μ parameters as there are attributes for the item.

By parameterizing the model in this manner, the number of parameters forpolytomously-scored items may be minimized. In a polytomously-scoreditem, additional item threshold parameters τ_(i2), τ_(i3), . . . ,τ_(i(M-1)) may be added for an M-category item (along with theadditional threshold parameters c_(i2), c_(i3), . . . , c_(i(M-1)) forthe residual part). The area under each normal distribution may beseparated into M regions. The area of each region may represent afunction of the π*'s and r*'s needed to reproduce the cumulative scoreprobabilities in Equation (1).

For example, as shown in FIG. 2, a three-category item (item scores 0,1, and 2) may include two attributes. FIG. 2 is analogous to FIG. 1except for an additional threshold parameter is added to account for theadded score category. The cumulative score probabilities in Equation (1)may be a function of both a diagnostic component and a residualcomponent. For examinees that have mastered both required attributes(i.e., examinees whose response propensities are represented by the topdistribution), the probability of executing the attributes sufficientlywell to achieve a score of at least 1 may be given by the area above thefirst threshold τ_(i1) 120 under the normal distribution 205. Theprobability of executing the attributes sufficiently well to achieve ascore of at least 2 is given by the area above the second thresholdτ_(i2) 220 under the normal distribution 205. For examinees that havefailed to master the second attribute only, the areas above τ_(i1) andτ_(i2) in the second distribution 210 may likewise represent theprobabilities of executing the attributes sufficiently well to obtainscores of at least 1 and 2, respectively. For examinees that have failedto master the first attribute only, the areas above τ_(i1) and τ_(i2) inthe third distribution 215 may likewise represent the probabilities ofexecuting the attributes sufficiently well to obtain scores of at least1 and 2, respectively.

A Bayesian estimation strategy for the model presented in Equations (1)and (2) may be formally specified using the τ, μ, and c parameters thatare estimated. The π's and r*'s may then be derived from theseparameters. The τ, μ, and c parameters may be assigned non-informativeuniform priors with order constraints to ensure positive score categoryprobabilities under all conditions. For example, the following priorsmay be assigned:

τ_(i1)˜Unif(−5,5),

τ_(im)˜Unif(τ_(i(m-1)),5), for m=(2, . . . , M_(i)−1)

c_(i1)˜Unif(0,3),

c_(im)˜Unif(0,c_(i(m-1))), for m=(2, . . . , M_(i)−1)

μ_(ik)˜Unif(−10,0) for k=(1, . . . , K_(i)) where K_(i) is the number ofattributes required by item i=(1, . . . , I) in the Q-matrix.

From these parameters, the more traditional polytomous Fusion Modelparameters in Equation (1) may be derived as follows:

π*_(im)=1−Φ(τ_(im)) for m=(1, . . . , M_(i)−1) where Φ denotes thecumulative density function (CDF) of a standard normal distribution; and

r* _(imk)=[1−Φ(τ_(im)−μ_(ik))]/π*_(im) for m=(1, . . . , M_(i)−1) andk=(1, . . . , K_(i)).

The quantile range (−5, 5) may cover 99.99% of the area under a standardnormal curve. This may imply vague priors between 0 and 1 for allπ*_(im) and r*_(imk).

The correlational structure of the examinee attributes α_(j) may bemodeled through the introduction of a multivariate vector of continuousvariables {tilde over (α)}_(j) that is assumed to underlie thedichotomous attributes α_(j). Similar to the theory underlying thecomputation of tetrachoric correlations, α_(j) may be assumed to be amultivariate normal, with mean 0, a covariance matrix having diagonalelements of 1, and all correlations estimated. A K-element vector κ maydetermine the thresholds along {tilde over (α)}_(j) that distinguishmasters from non-masters on each attribute. Accordingly, the vector κmay control the proportion of masters on each attribute (p_(k)), wherehigher settings imply a smaller proportion of masters. Each element of κmay be assigned a normal prior with mean 0 and variance 1. Likewise, forthe residual parameters θ_(j), normal priors may be imposed having mean0 and variance 1.

In an embodiment, a covariance matrix Σ may be used instead of thecorrelation matrix to specify the joint multivariate normal distributionfor the ã's and θ's for each examinee. This covariance matrix may beassigned a non-informative inverse-Wishart prior with K+1 degrees offreedom and symmetric positive definite (K+1)×(K+1) scale matrix R,Σ˜Inv-Wishart_(K+1)(R). An informative inverse-Wishart prior for Σ mayalso be used by choosing a larger number of degrees of freedom (DF)relative to the number of examinees, and scale matrix R=E(R)*(DF−K−2)where E(R) is the anticipated covariance (or correlation) matrix.Because the ã_(jk) are latent, they may have no predetermined metric.Accordingly, their variances may not be identified. However, suchvariances may only be required in determining α_(jk). This indeterminacymay not affect the determination of the dichotomous α_(jk) since thethreshold κ_(k) may adjust according to the variance of ã_(jk). This mayresult because the sampling procedure used for MCMC estimation maysample parameters from their full conditional distribution such thatκ_(k) is sampled conditionally upon {tilde over (α)}_(jk). As a result,if the variances drift over the course of the chain, the κ_(k) may tendto follow the variance drift such that the definition of attributemastery remains largely consistent (assuming the mastery proportions areestimable). The latent attribute correlation matrix may be derived fromthe covariance matrix once a MCMC chain has finished.

In an embodiment, a covariance structure may be applied for the latentattribute correlations. For example, since many tests are substantiallyunidimensional in nature, the latent attribute correlations may conformto a single factor model. For an examinee j and an attribute k, this maybe expressed as:

{tilde over (α)}_(jk)=λ_(k) F _(j) +e _(jk),

where

F_(j) is the level on the second order factor underlying the attributecorrelations for examinee j, specified to have mean 0 and variance 1;

λ_(k) represents the factor loading for attribute k on the second orderfactor; and

e_(jk) represents a uniqueness term with mean 0 across examinees andvariance Ψ_(k).

Accordingly, a new matrix Σ* based on the factor loadings and uniquenessvariances may be used to replace the covariance matrix Σ describedabove. λ parameters may be sampled for each attribute in place of thecovariance matrix Σ. In addition, Ψ_(k) may be set to (1−λ_(k) ²). Assuch, a consistent metric for the {tilde over (α)}_(jk) parameters maybe imposed with a variance of 1. In an embodiment, a uniform prior maybe imposed on each λ_(k) with bounds of, for example, 0.2 and 1.0.s

FIG. 3 is a block diagram of exemplary internal hardware that may beused to contain or implement program instructions according to anembodiment. Referring to FIG. 3, a bus 328 serves as the maininformation highway interconnecting the other illustrated components ofthe hardware. CPU 302 is the central processing unit of the system,performing calculations and logic operations required to execute aprogram. Read only memory (ROM) 318 and random access memory (RAM) 320constitute exemplary memory devices.

A disk controller 304 interfaces with one or more optional disk drivesto the system bus 328. These disk drives may be external or internalfloppy disk drives such as 310, CD ROM drives 306, or external orinternal hard drives 308. As indicated previously, these various diskdrives and disk controllers are optional devices.

Program instructions may be stored in the ROM 318 and/or the RAM 320.Optionally, program instructions may be stored on a computer readablemedium such as a floppy disk or a digital disk or other recordingmedium, a communications signal or a carrier wave.

An optional display interface 322 may permit information from the bus328 to be displayed on the display 324 in audio, graphic or alphanumericformat. Communication with external devices may optionally occur usingvarious communication ports 326. An exemplary communication port 326 maybe attached to a communications network, such as the Internet or anintranet.

In addition to the standard computer-type components, the hardware mayalso include an interface 312 which allows for receipt of data frominput devices such as a keyboard 314 or other input device 316 such as aremote control, pointer and/or joystick.

An embedded system may optionally be used to perform one, some or all ofthe disclosed operations. Likewise, a multiprocessor system mayoptionally be used to perform one, some or all of the disclosedoperations.

As such, those skilled in the art will appreciate that the conceptionupon which this disclosure is based may readily be utilized as a basisfor the designing of other structures, methods and systems for carryingout the several purposes of the present invention. It is important,therefore, that the claims be regarded as including such equivalentconstructions insofar as they do not depart from the spirit and scope ofthe disclosed embodiments.

1. A method for determining attribute score levels from an assessment, the method comprising: for at least one item on the assessment, wherein the item tests for at least one attribute: generating a first distribution having a response propensity represented by a highest level of execution for each attribute tested by the item; for at least one score, determining an item threshold for the first distribution corresponding to a level of execution corresponding to the score; for at least one attribute tested by the item: generating a second distribution having a response propensity represented by a lowest level of execution for the attribute and the highest level of execution for all other attributes tested by the item, and determining a mean parameter for the second distribution; and determining an attribute score level for at least one score based on the at least one item threshold and the at least one mean parameter.
 2. The method of claim 1 wherein the first distribution comprises a standard normal distribution.
 3. The method of claim 1 wherein the item threshold for a first distribution corresponding to a first score is selected from a uniform distribution defined by Unif(−5, 5).
 4. (canceled)
 5. The method of claim 1 wherein the first distribution comprises a first distribution mean parameter, and wherein the first distribution mean parameter is greater than the mean parameter for each second distribution.
 6. The method of claim 1 wherein the item threshold corresponding to a first score is greater than an item threshold corresponding to a second score if the first score is greater than the second score.
 7. The method of claim 1 wherein a second distribution comprises a standard normal distribution.
 8. The method of claim 1 wherein the mean parameter for a second distribution is less than
 0. 9. The method of claim 1 wherein the mean parameter for a second distribution is selected from a uniform distribution defined by Unif(−10, 0).
 10. (canceled)
 11. (canceled)
 12. (canceled)
 13. A method for determining one or more examinee attribute mastery levels from an assessment, the method comprising: receiving a covariate vector for an examinee, wherein the covariate vector includes a value for each of one or more covariates for the examinee; and for each of one or more attributes: computing an examinee attribute value based on at least the covariate vector and one or more responses made by the examinee to one or more questions pertaining to the attribute on an assessment, and assigning an examinee attribute mastery level for the examinee with respect to the attribute based on whether the examinee attribute value surpasses one or more thresholds.
 14. (canceled)
 15. (canceled)
 16. A system for determining attribute score levels from an assessment, the system comprising: a processor; and a processor-readable storage medium in communication with the processor, wherein the processor-readable storage medium contains one or more programming instructions for performing a method of determining attribute score levels from an assessment, the method comprising: for at least one item on the assessment, wherein the item tests for at least one attribute: generating a first distribution having a response propensity represented by a highest level of execution for each attribute tested by the item, for at least one score, determining an item threshold for the first distribution corresponding to a level of execution corresponding to the score, for at least one attribute tested by the item: generating a second distribution having a response propensity represented by a lowest level of execution for the attribute and the highest level of execution for all other attributes tested by the item, and determining a mean parameter for the second distribution, and determining an attribute score level for at least one score based on the at least one item threshold and the at least one mean parameter.
 17. The system of claim 16 wherein the first distribution comprises a standard normal distribution.
 18. The system of claim 16 wherein the first distribution comprises a first distribution mean parameter, and wherein the first distribution mean parameter is greater than the mean parameter for each second distribution.
 19. The system of claim 16 wherein the item threshold corresponding to a first score is greater than an item threshold corresponding to a second score if the first score is greater than the second score.
 20. The system of claim 16 wherein a second distribution comprises a standard normal distribution.
 21. The system of claim 16 wherein the mean parameter for a second distribution is less than
 0. 22. (canceled) 